Multivariate Generalizability Analyses of Mixed-Format Advanced Placement Exams
نویسندگان
چکیده
When combining item types within assessments, it is important to consider the reliability of scores for each item type and the reliability of composite scores. This study investigated the effect of scoring procedures, section weights, and numbers of items per section on reliability, error variance, and conditional standard errors of measurement, using multivariate generalizability theory techniques. Results indicate that the multiple-choice scoring method (number correct vs. formula scoring) may impact multiple-choice section and composite score reliability, and result in different optimal section weights. Although the multiple-choice section contributes between 50 and 60 percent of the composite score points operationally, optimal weighting from a reliability or error variance perspective would increase the multiple-choice contribution to the composite to 80 or 90 percent. Increasing the number of free-response items often improves the reliability of the composite, but administration time limits the number of free-response items that can be administered. The results of this study are necessarily specific to the Advanced Placement (AP) Biology and AP World History Exams considered, but these data serve to illustrate the usefulness of generalizability theory for answering test development questions that arise with mixed-formats exams.
منابع مشابه
Modeling Measurement Facets and Assessing Generalizability in a Large-Scale Writing Assessment
Measurement error and reliability are two important psychometric properties for large-scale assessments. Generalizability theory has often been used to identify sources of error and to estimate score reliability. The complicated nature of sparse matrix data collection designs in some assessments, however, can cause challenges in conducting generalizability analyses. The present study examines p...
متن کاملTemporal stability of objective structured clinical exams: a longitudinal study employing item response theory
BACKGROUND The objective structure clinical examination (OSCE) has been used since the early 1970s for assessing clinical competence. There are very few studies that have examined the psychometric stability of the stations that are used repeatedly with different samples. The purpose of the present study was to assess the stability of objective structured clinical exams (OSCEs) employing the sam...
متن کاملAdvanced and upper-intermediate EFL learners’ reciprocity to mediation: A dynamic listening assessment
The present study aimed to capture and represent the mediator-learner’s interaction in the development of listening proficiency and statistically compare this interaction between high and low proficient English as a foreign language (EFL) learners. Thirty EFL learners participated in Oxford Quick Placemat Test (OQPT) and the Interactions/Mosaic Listening Placement (IMLP) Test to select those wh...
متن کاملConstructing licensure exams: a reliability study of case-based questions on the National Board Dental Hygiene Examination.
Patient cases with associated questions are a method for increasing the clinical relevance of licensure exams. This study used generalizability theory to assess changes in score reliability when the number of questions per case varied in the National Board Dental Hygiene Examination (NBDHE). The experimental design maintained the same total number of case-based items, while varying the number o...
متن کاملClassification Accuracy of Mixed Format Tests: A Bi-Factor Item Response Theory Approach
Mixed format tests (e.g., a test consisting of multiple-choice [MC] items and constructed response [CR] items) have become increasingly popular. However, the latent structure of item pools consisting of the two formats is still equivocal. Moreover, the implications of this latent structure are unclear: For example, do constructed response items tap reasoning skills that cannot be assessed with ...
متن کامل